-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OperatorSDK enhancement with best practices and OKT proposition #84
base: master
Are you sure you want to change the base?
Conversation
To follow up our (@orange) exchange with Tony Wu and Eric Stroczynski. |
- In the Controller folder I use the OKT code generator for my 3 resources I want to create. i.e.: | ||
`okt-gen-resource xxxxx` | ||
- In each file generated, I always find 3 parts to fill by myself and that will constitute a common/standard way to Mutate my resource: | ||
- A method called `getHashXXX() { // Put your code here }` - It allows me to define, thanks to an OKT Helper, wich part of my object I want to include in the hash computation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the hash's input created? Marhsaling the object itself? I've seen quite a few cases where a time or resourceVersion
value is included in a hash/comparison, with an always-true result, so exposing such a function may be a source of confusion and bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly for each resource, you have to define in the "getHashXXX()" method, the scope of objects/elements covered by the hash computation. OKT provides you an helper to define this scope. Internally it use an array of pointers with all the elements participating to the scope (the metadata you want, the whole Object's Spec part if you want, actually whatever you want).
You return this "bag" of collected elements through the return statement of "getHashXXX()' method.
To perform the Hash computation, OKT use the spew library (github.com/davecgh/go-spew/spew
) which follows pointers and prints actual values of the nested objects ensuring the hash does not change when a pointer changes.
The returned hash can be used for object comparisons.
So if the developper don't add resourceVersion
in the scope, there's no risk to meet the bug you mentionned. I know that some immutable fields (IP in Services, ...) and the Secret's data are things to take care about (OKT helper help on this point).
There is an example in the operator Memcached with OKT
I sent to you and Tony (it can be shared).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it sounds like the hash is defined by the developer and they can choose as much or as little in their hash. And I agree with Eric that if someone unknowingly adds an attribute that changes it will cause an infinite reconcilation loop. (we've done this ourselves :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if someone unknowingly adds an attribute that change it will cause an infinite reconcilation loop
OKT comes with an integrated throttling mechanism in case of a same recurrent errors, but not sure it can help in such case...
Besides that, when the developer defines the hash in the "getHashXXX()" method, he uses an OKT helper generic or dedicated for the resource kind he is dealing with. For example, the SecretMutationHelper object manages few things with StringData and Data fields. It ensures too that the hash computation covers only the Data field and not the StringData field.
Here again, I don't know if there's a more elegant solutions and if a such helper is sufficient to guarantee the problem of the changing attributes added unknowingly.
The counterpart of these helpers, implemented by OKT, is the need to maintain them. Today, the only specific one that exists (the others are generic helpers), is the SecretMutationHelper. I thought to need another one specific for the Service resource and its changing ClusterIP, but in practice all works correctly without special care, it seems it was a mistake to think that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edited the last sentence regarding specifics vs generics helpers.
1. I create a project with the Operator SDK command as usual | ||
2. In my Controller, I choose an OKT Reconciler and add the Reconciliation steps function in which I can see distinctly the standard steps in which I should pass at each reconciliation event (each step below is an entry in an embracing `switch(step) { case "xxx": ... }`: | ||
- **CR Checker** - once the CR is picked up by OKT and validated by webhooks (if any), I can tell here if I have something to do more on my CR | ||
- **ObjectGetter** - Here I will add the code to create my 3, in memory, resources and add them to the OKT's registry of resources (still in memory). However at this stage OKT pick up the resource on the K8S Cluster so it knows if the resource in its registry has an existing peer on the Cluster (or cache). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you mean that the operator registers 3 types with OKT, this sounds a lot like controller-runtime's cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit don't know lot of things about the controller-runtime's cache implementation.
At OKT level it is an array of a generic resource type, that OKT uses at the "Mutator" and "Updater" reconciliation steps to loop on each resource and call respectively the "getHashXXX(), MutateWithInitialData(), MutateWithCR()" (for mutation) and a specific "CreateOrUpdate()" OKT method.
|
||
Today, the Story 1 is fully operational and implementable with the current version of OKT. A partial implementation detail can be provided by the MemcachedOperatorSampleWithOKT repo code. | ||
|
||
The architecture to render story 2 implementable is not yet fully completed and we are wondering if this approach make sense or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the concept of a controller state machine makes a lot of sense, and one generic enough to register an arbitrary application lifecycle for an operator to execute would be powerful. I think such a library warrants an enhancement on its own, with a prototype ideally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thank you, the concept is interresting and I have some ideas to render the LC graph implementation generic at some points. Note that the K8S resources reconciliation steps (CRChecker(), ObjectsGetter(), ..., SuccessManager() ) are already implemented thanks to a StateMachine library (Qor transition), it is a bit overrated because initialy I thought about a more complex reconciliation process, but it's done with that right now.
It is clear now, that it is better to separate in 2 distincts processes the K8S resources reconciliation and the Application's LC graph stages.
Besides that, some ideas comes in mind. We could manage the state changes through a standard events formalizations as the one brought by the CloudEvents specs and lib. Providing a standard way to observe and pilot the state change seems interesting. So we already are looking for integrating CloudEvents in OKT for this purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 agree that this state machine idea sounds quite interesting.
|
||
The architecture to render story 2 implementable is not yet fully completed and we are wondering if this approach make sense or not. | ||
|
||
The OKT library is a GO module that depends on the OperatorSDK, more specifically on the sigs and k8s.io modules aligned on those used by the OperatorSDK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify exactly what you mean by "depends on the OperatorSDK"? The operator-sdk repo contains a binary that scaffolds code from controller-runtime, with no public libraries; perhaps you mean OKT depends on the latter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes there's no real dependancies except that OKT has no real sense without the usage of the OperatorSDK as OKT is done to be used inside an OperatorSDK's scaffolded project.
Besides that, each time the librairies used by the OperatorSDK evolves (sigs, ...), we have to align the OKT librairy as well.
@tapairmax once the OKT library is public, can you link it here? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few changes to fix some spelling errors. Overall, a great proposal.
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
Co-authored-by: Jesus Rodriguez <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposal seems really interesting. It would be helpful to have the additional capabilities of OKT integrated in SDK or controller-runtime. Is there any place we can take a look at the sample Memcached Operator built using OKT (which is mentioned in proposal) ?
- **Updater** - Here OKT, thanks to a hash algorithm on resources data, will compute if after Mutation, the resource has to be Created, Update, or is unchanged against it's cluster peer instance. So for all resource in its registry it apply the same idempotent process to update (or not!!) the resource. | ||
- **SuccessManager** - If all goes right (no error) , this stage is reached, and here I'll can tell to OKT to "manageSuccess()" i.e. check for the right requeuing value to return and complete this reconciliation but at same time perform the update of a status condition for the Reconciliation, transparently for me, if I passed the CR.Status when I instatiate my OKT Reconciler object. | ||
- At any previous stage, if something goes wrong, an error can be raised, there's also the case where we giveup the reconciliation for whatever reason (not an error). There is also the case where a CR Finalization is triggered. These "debranching" cases are also present in my Reconciliation steps function thanks to 2 dedicated steps: | ||
- **CRFinalizer** - here I can perform my own tasks when OKT detected that a finalizer (with the right name) exists and the CR is being deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this similar to the finalizer helpers present in controller-runtime - https://github.com/kubernetes-sigs/controller-runtime/tree/master/pkg/finalizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the OKT Reconciler just check if a finalizer in CR Metadata with the same name as itself exists. If it's the case and the CR is being deleted, the stage "CRFinalizer" is called right after the stage "CRChecker" instead of continuing the normal process (ObjectsGetter, ...).
The principle is to do what you need in the CRFinalizer stage and call a provided method to delete the finalizer.
So it is just a way to manage a single finalizer on the CR.
I guess the helper you mentioned is generic as well to manage several finalizers...
- **SuccessManager** - If all goes right (no error) , this stage is reached, and here I'll can tell to OKT to "manageSuccess()" i.e. check for the right requeuing value to return and complete this reconciliation but at same time perform the update of a status condition for the Reconciliation, transparently for me, if I passed the CR.Status when I instatiate my OKT Reconciler object. | ||
- At any previous stage, if something goes wrong, an error can be raised, there's also the case where we giveup the reconciliation for whatever reason (not an error). There is also the case where a CR Finalization is triggered. These "debranching" cases are also present in my Reconciliation steps function thanks to 2 dedicated steps: | ||
- **CRFinalizer** - here I can perform my own tasks when OKT detected that a finalizer (with the right name) exists and the CR is being deleted. | ||
- **ErrorManager** - This stage can be reached after any other stages, and let me call the OKT "manageError()" method that will pickup the last error and return the right requeuing value to complete this reconciliation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to explain in detail on how the OKT Reconciler
is different from the generic controller-runtime reconciler which is used in operator-sdk ? In the sense the aspects which are missing from the current implementation or the additional features which OKT Reconciler implements to fill the gaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes a good explanation by a sample would help. This the the reason I'll provide a more complete version of the Memcached operator sample. I believed Eric S. got a tarball with the first version of Memcached aligned with the "User Story 1" in this issue, feel free to go it.
Now, I'm very close to get the official GO to publish OKT on Github, I think there's no issue to drop a new Memcached operator sample under my Github space. I'll try to release it nex week.
OKT comes with a statemachine feature that should help in defining these steps and let me focus on the code I need to implement at each step. | ||
To allow this, OKT provides: | ||
|
||
- a sidecar for my application to help me to get my database status and launch actions on it asynchronously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be really useful. But just curious, would conditions be useful here to expose the status of CR. Please correct me if this is totally tangential to what is being described here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure to well capture your point but I guess you're wondering if the CR status is the right place to store the conditions that represent the application's state.
I had in mind the ability to offer a simple way to follow the application state thanks to a "kubectl wait" command for example. But actually it sounds a bit unsuitable to me too, so may be that was the reason of your question :).
So it's a point still unstated today. Besides that, building a specific interface relying on the CloudEvent api could be a better option either in term of interoperability with external world and may be for others internal needs too...
You can ask me access to the memcached operator sample implemented with OKT. |
Here we are ! We had the GO to publish in Open Source community the "Operators Karma Tools" and it is DONE. It is published in Orange Github here. The Memcached operator with OKT can both be tested. As long as OTK resides on the Orange repo, it could only evolve slowly. Last edit: adjust a bit the message. |
No description provided.